REMARKS 

Amendments to the Specification 

Applicant has amended the specification merely to correct simple typographical or 
grammatical errors which would be clear to one skilled in the art. 

At page 8, line 12 and page 9, line 27, Applicant has corrected the spelling of 
"principle." 

At page 21, line 7, Applicant has corrected the spelling of "searches." 

At page 22, line 6, Applicant has corrected the date of the referenced citation. A copy 
of the reference is being supplied for the Examiner's convenience with the accompanying 
Information Disclosure Statement, as Reference AT. 

At page 22, line 21, Applicant has corrected the spelling of "minimal." 

At page 23, line 16, Applicant has changed "at" to the grammatically correct "as." 

At page 27, line 6, Applicant corrects the expression "2X-d". Applicant respectfully 
submits that a skilled artisan's understanding of the described technique would be sufficient 
to realize that the expression in the application as filed is not correct and should be replaced 
in the manner suggested. 

At page 29, line 7, page 29, line 9, page 46, line 1, page 50, line 20 and page 60, line 
13, Applicant has corrected the spelling of "chose." 

At page 35, lines 2 and 3, Applicant has deleted superfluous text from the heading. 

At page 35, line 4, Applicant has added a list item "A)" to the sub-heading. 

At page 42, line 17, Applicant has replaced the occurrence of "i.e." by the more 
grammatically appropriate "e.g.". 

At page 47, line 22, page 52, line 10, page 58, line 19 and page 71, line 31, Applicant 
has corrected a number of informalities relating to grammar and syntax. 

At page 60, line 10, Applicant has corrected the spelling of "closest." 

At page 70, line 19 Applicant has corrected the incorrect number of parentheses 
presented in the equation. 

At page 71, Applicant has deleted the trailing quotation mark from "painted". 

In summary, it would be clear to one skilled in the art that the above-described 
amendments are merely obvious corrections to small typographical defects in the 



-36- 



CAl -272698.1 




specification as filed. Accordingly, no new matter is believed to be introduced by this 
amendment. 

Amendments to the Claims 

Claim 1 is pending in the instant application. Applicant amends claim 1 to 
more particularly recite and distinctly claim that which he considers to be his invention. New 
claims 2-139 have been added to more particularly point out and distinctly claim that which 
Applicant regards to be the invention. Applicant submits that the above-made amendment 
and new claims are fully supported in the instant application as originally filed, and do not 
constitute new matter. 

Conclusion 

With this Amendment, Applicant has amended claim 1 and has introduced 
new claims 2-139. The subject matter of the new claims is fully supported in the 
specification and no new matter is added. Accordingly, Applicants respectfully request that 
the above-made amendments be entered into the file history of the instant application. Upon 
entry of the amendments, claims 1-139 will be pending in the instant application. An early 
allowance is earnestly requested. 
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APPENDIX A: 
CHANGES TO SPECIFICATION 
UPON ENTRY OF THE PRELIMINARY AMENDMENT 



U.S. PATENT APPLICATION SERIAL NO. 09/644,937 
(ATTORNEY DOCKET NO. 9476-003-999) __ 

The following mark-up scheme is adopted: 
Deleted material: Strike-through. 
Inserted material: Bold Underline 

The paragraph beginning at page 8, line 5 is revised as follows: 

An additional aspect considered by Mestres et al . is the 
issue of molecules existing in multiple structural 
conformations, i.e. energetically there may be more than one 
possible structure for a given molecule. Mestres et al . 
calculate the similarity indexes of all pairs of conformations 
of a molecule and perform what is known as principle principal 
component analysis (PCA) . They do this to find 
representatives of all possible conformations that are most 
distinct. Although this procedure is really akin to finding 
the dimensionality of the space in which these conformers 
exist, Mestres et al. do not use PCA for this purpose, but 
merely to cluster the conformers. They do not apply PCA to 
sets of different molecules, only to conformers of the same 
molecule, and they do not use any other "metric" property of 
their similarity measure. In fact they seem unaware of such. 

The paragraph beginning at page 9, line 20, and carrying over to page 10, line 12 is revised 
as follows: 

A metric distance may also be used in a technique called 
"embedding" . The number of links between the elements of a 
set of N elements can be shown to be N*(N-l)/2 and each link 
can be shown to be a metric distance. While a set of N 
elements has N*(N-l)/2 distances, the set can always be 
represented by an ordered set of (N-l) numbers, i.e. I can 
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"embed" from a set of distances to a set of N positions in (N- 
1) dimensional space. This is identical to Principle 
Principal Component Analysis mentioned previously, except that 
with PCA one finds the most "important" dimensions, i.e. the 
"principal" directions, which carry most of the variation in 
position. Typically with PCA one truncates the dimensionality 
at 2 or 3 for graphical display purposes. In general, the 
number of dimensions which reproduces the set of N*(N-l)/2 
distances within an acceptable tolerance may be much smaller 
than (N-l), yet still be greater than 2 or 3. Hence one talks 
of "embedding into a hyper-dimensional subspace", where hyper- 
dimensional means more than 3 dimensions, and subspace means 
less than (N-l) . Techniques for such an embedding are 
standard linear algebra. When applied to molecular fields, 
the result of embedding is a shape-space of M < N-l 
dimensions . 

The paragraph beginning at page 21, line 26, and carrying over to page 22, line 8 is revised 
as follows: 

Various techniques exist to attempt to find the best 
overlap of two fields, typically involving repeated searchs 
searches from different starting orientations of the two 
molecules. This is necessary because no direct solution for 
the minimal distance orientation is available, and most 
methods tend to get caught in nearby local minima, missing the 
global minimum. One such technique is a Gaussian technique 
described in J. A. Grant et al . , "A Fast Method of Molecular 
Shape Comparison: A Simple Application of a Gaussian 
Description of Molecular Shape," J. Computational Chemistry, 
Vol. 17, No. 14, pp. 1653-66 (19GG) (1996) . Using this 
technique, I overlaid the two molecules shown in Fig. 2A to 
produce the result shown in Fig. 2B. 

The paragraph beginning at page 22, line 21 is revised as follows: 
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In the following, I may refer interchangeably to maximal 
overlap (or overlay) , mimirnal minimal field difference, and 
minimal distance, as they all refer to and measure the same 
optimal orientation of two molecules with respect to each 
other . 

The paragraph beginning at page 23, line 11 is revised as follows: 

In all of my methods for using the field metric, the 
steric field for each molecule is constructed either from a 
sum of Gaussians centered at each atom, or as one minus the 
product of one minus each such Gaussian. These are referred 
to a-b as the "sum form'' and "product form" respectively. The 
product form has the advantage that it removes excess internal 
overlap and hence is smoother inside. The sum form has the 
advantage that it is numerically simpler. Each Gaussian is 
such that its volume is the same as that of the atom it 
represents, and the volume, as of any field, is calculated 
from the integral of the function over all space. 

The paragraph beginning at page 26, line 11 and carrying over to page 27, line 11 is 
revised as follows: 

For example, if I have 1000 molecules in my database I 
might organize this information thus: select 10 u key" 
molecules which are quite different in shape. For each of 
these 10 key molecules I then find the distance from each of 
these molecules to every other molecule in the database, and 
make 10 lists where each list has a different key molecule at 
the top and the rest of the 999 molecules are listed in order 
of shortest distance from it. To find the closest match 
between a test molecule and the 1000 molecules of the database 
I begin by determining the metric distances between the test 
molecule and each of the key molecules. Suppose the shortest 
distance is to key molecule 6 and that distance is X. I now 
begin to calculate the distances to the rest of the molecules, 
but in the order specified by that key molecule's list. Since 



-40- 



CAl -272698.1 



# • 



the list has molecules close to key molecule 6 first, it is 
likely these are also close to my test molecule. Furthermore, 
by the triangle inequality, since molecules which are a 
distance greater than 2X from key molecule 6 must be greater 
than X from my test molecule, I only have to go down the list 
until this condition is satisfied, i.e. I may not have to test 
all 1000 molecules. Furthermore, if I find a molecule closer 
than key molecule 6 early in the list, say distance X-d, then 
I only have to go down the list until the distance from the 
key molecule is greater than 2X"d 2 (X-d) , i.e. I can refine 
the cutoff distance as I progress down the list. Thus I can 
search the database, by shape, in a time sublinear with the 
number of molecules in the database. These methods are not 
possible without evaluating a shape space description of the 
set of molecules that comprise the database. 

The paragraph beginning at page 29, line 6 is revised as follows: 

a) Chose Choose the number of EGFs that I want to represent 
the field. 

The paragraph beginning at page 29, line 8 is revised as follows: 

b) Chose Choose random positions for the center of each EGF 
and make each spherical, i.e. a=b=c=l . 

The heading at page 35, line 1 is revised as follows: 

1 : Finding the maximal overlap (minimal field difference) 

between two fields A and B diff e r e n ce b e tween tw o fi e lds 
A and -B 

The sub-heading at page 35, line 4 is revised as follows: 

A) Exhaustive Search: 

The paragraph beginning at page 42, line 6 and carrying over to page 43, line 3 is revised 
as follows: 

Once I have a shape space for N molecules, of dimension M, 
the next step is to calculate the position within this shape 
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space for a molecule not used in the construction of that shape 
space. This position is found by analogy with triangulation in 
three dimensions, i.e. if one has a set of distances from an 
object to four reference objects the exact position can be 
ascertained. In two dimensions one needs three distances. In 
M dimensional shape space one needs M+l distances. (In each of 
these cases, the M+l distances must be from points which cannot 
as a set be described at a dimensionality less than M, i.e. 
e.g. for the case of three dimensions, the four reference 
points cannot all lie in a 2 dimensional plane) . The actual 
procedure for going from distances to a position is simply that 
a linear equation for the coordinates can be generated from 
each distance, such that the solution of the set of such 
produces the position. This set of linear equations can be 
solved by any standard method, for instance, Gauss- Jordan 
elimination (see, for example Stoer and Bulirsch, "Introduction 
to Numerical Analysis", 2 nd Ed., Springer-Verlag, chapter 4). 
An important note here is that this procedure can fail, i.e. it 
will produce a position which will underestimate the M+l 
distances by a constant amount. This is an indication that the 
structure under study actually lies in a higher dimensional 
space than the shape space previously constructed. As such, 
that shape space needs to be extended. 

The paragraph beginning at page 46> line 1 is revised as follows: 

(i) Chose Choose a structure at random from the N possible 
structures . 

The paragraph beginning at page 47, line 19 is revised as follows: 

(ii) From the set of N structures, select K key structures that 
are quite different from each other (i.e. are remote from 
each other in shape space) . For instance, the structures 
may simple simply be different from each other in total 
volume, or be chosen by more computationally intensive 
methods, e.g. as representatives of clusters of molecular 
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shapes found by standard clustering techniques (e.g. 
Jarvis-Patrick, etc). These more sophisticated methods 
may be greatly speeded if the shape space has been 
determined . 

The paragraph beginning at page 49, line 21 and carrying over to page 50, line 3 is revised 
as follows: 

Thus I can search the database, by minimum field 
difference, in a time sublinear with the number of molecules in 
the database. This is because, by the triangle inequality, I 
know the cutoff distance for evaluating structures in the list 
is at most equal to 2X (when BEST = X) and is potentially 
further refined as I progress down the list and find better 
(smaller) values for BEST. As noted above, the list creation 
process can be speeded if the shape space of the structures has 
already been determined. Whether the time saved will be 
j us Lily -justified by the time spent constructing the shape 
space depends on the number of key structures K and the number 
of structures in the database. 

The paragraph beginning at page 50, line 20 is revised as follow s: 

(ii) Chose Choose a structure at random from this set and 

record its name in the zero level node of a tree structure 
which is such that each "node", or "slot", has two child 
nodes, called "left" and "right", at what I refer to as a 
level one greater than this node. 

The paragraph beginning at page 52, line 10 is revised as follows: 

In (1) above, rather than sc choosing structures at random for 
insertion into the tree, they could instead be sorted into a 
list, for example in order of increasing volume, and then taken 
sequentially from the list for insertion into the tree. This 
allows additional criteria to be used to terminate a search of 
the branches of the tree . 
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The paragraph beginning at page 58, line 19 is revised as follows : 

(vi) If the number of EGF's use used in (ii) is greater than 

one check to see if this fragment adjusted EFF is greater 
than BEST. If so then quit the procedure, otherwise 
increment the number of EGF's to be used in (ii) by one 
and return to (ii) . 

The paragraph beginning at page 60, line 7 is revised as follows: 

(iv) For each of the four alignments, make the atom to atom 
assignments for the atoms which belong to the pair of 
EGF's being aligned together based upon "closet" "closest" 
or "closest of similar type" . 

The paragraph beginning at page 60, line 12 is revised as follows : 

(v) Rather than have an infinite number of possible alignments 
I now have just four to chose choose from, and given any 
kind of measure for the assignment (e.g. minimize the sum 
of the distances of each atom pair) this is 
straightforward. 

The paragraph beginning at page 67, line 6 is revised as follows: 

(ix) Otherwise actually find the best metric field difference 
between the new molecule and the current database 
structure. If this value is less than BEST, set BEST 
equal to this value, set the value of BE3T3TRUCTRE 
BESTSTRUCTURE to indicate this structure. Go to (v) 
unless this is the last structure in the database. 



The paragraph beginning at page 70, line 11 is revised as follows: 

(i) Define a fitting function f between any two EGF's such 
that if both were spherical this function would be a 
minimum when the inter-EGF distance is the same as the sum 
of the radii of each EGF (defining the radii of the EGF as 
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that of a sphere of equivalent volume) . Such a function 
for two EGF's, EGF1 and EGF2 , is: 

f = a*V - b* (Q (EGF1, V) - b* (Q (EGF2, V) O ( EGF2 , V) ) 
where V = Q (EGF1, EGF2) where Q is defined in equation 
( 6 ) above . 

The paragraph beginning at page 71, line 29 and carrying over to page 72, line 2 is revised 
as follows: 

This procedure produces a series of single EGF descriptions of 
the active site. These EGF's may be painted—, based upon 
properties of the nearest proteins protein atoms, or of any- 
field quantity generated by such atoms, e.g. electrostatic 
potential . 
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APPENDIX B: 
CHANGES TO CLAIMS 
UPON ENTRY OF THE PRELIMINARY AMENDMENT 



U.S. PATENT APPLICATION SERIAL NO. 09/644,937 
(ATTORNEY DOCKET NO. 9476-003-999) 

The following mark-up scheme is adopted: 
Deleted material: Strike-through. 
Inserted material: Bold Underline 

1 . A computer-implemented method of finding the clos e st match , in a group of 
N objects, those objects whose minimal metric distance from between a first object and N 
objects is less than a threshold distance. X. comprising the ste p s of : 

selecting a small number M of the N objects , wherein M is much less than N 
and wherein M is a dimensionality of a shape space of the group of objects and wherein 
said number M of objects represents said shape space : 

for each of the objects M determining its metric distance to all the other N 

objects; 

for each of the objects M, making an ordered list of the minimal metric 
distances between that object each of the M objects and att each of the other N objects; 

determining the minimal metric distances between the first object and each of 
the M objects, thereby identifying a second object of said M objects that has the smallest 
minimal metric distance between itself and the first object: 

determining the calculating a minimal metric distanc e s distance between the 
first object and th e obj e cts at least one object on the ordered list associated with the said 
second object M that has the shortes t metric distance between i t and the first object , bv: 

said metric distances being determined beginning with the object on 
the said ordered list that has the shortest smallest minimal metric distance between 
it and the said second object M and continuing such determination with objects 
having increasingly greater minimal metric distances from the said second object M 
until an object is reached that has a minimal metric distance from the said second 
object M that is more than twice the minimal metric distance from the said first 
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object to the said second object fr f, or whose minimal metric distance from said 
second object is more than twice the threshold distance from the first object; 

repeating said calculating step wherein said second object has a 
next smallest minimal metric distance from the first object until each of said M 
objects has been considered; and 

selecting those objects from said calculating step whose minimal 
metric distance from the first object is less than X . 
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